A Learning Rate Analysis of Reinforcement Learning Algorithms in Finite-Horizon
نویسندگان
چکیده
Many reinforcement learning algorithms, like Q-Learning or R-Learning, correspond to adaptative methods for solving Markovian decision problems in innnite-horizon when no model is available. In this article we consider the particular framework of non-stationary nite-horizon Markov Decision Processes. After establishing a relationship between the nite-horizon total reward criterion and the average-reward criterion in nite-horizon, we deene Q H-Learning and R H-Learning for nite-horizon MDPs. Then we introduce the Ordinary Diierential Equation (ODE) method to conduct a learning rate analysis of Q H-Learning and R H-Learning. R H-Learning appears to be a version of Q H-Learning with matrix-valued step-sizes, the corresponding gain matrix being very close to the optimal matrix which results from the ODE analysis. Experimental results connrm that performance hierarchy.
منابع مشابه
Reinforcement Learning with Time
This paper steps back from the standard infinite horizon formulation of reinforcement learning problems to consider the simpler case of finite horizon problems. Although finite horizon problems may be solved using infinite horizon learning algorithms by recasting the problem as an infinite horizon problem over a state space extended to include time, we show that such an application of infinite ...
متن کاملReinforcement Learning in Neural Networks: A Survey
In recent years, researches on reinforcement learning (RL) have focused on bridging the gap between adaptive optimal control and bio-inspired learning techniques. Neural network reinforcement learning (NNRL) is among the most popular algorithms in the RL framework. The advantage of using neural networks enables the RL to search for optimal policies more efficiently in several real-life applicat...
متن کاملReinforcement Learning in Neural Networks: A Survey
In recent years, researches on reinforcement learning (RL) have focused on bridging the gap between adaptive optimal control and bio-inspired learning techniques. Neural network reinforcement learning (NNRL) is among the most popular algorithms in the RL framework. The advantage of using neural networks enables the RL to search for optimal policies more efficiently in several real-life applicat...
متن کاملMulticast Routing in Wireless Sensor Networks: A Distributed Reinforcement Learning Approach
Wireless Sensor Networks (WSNs) are consist of independent distributed sensors with storing, processing, sensing and communication capabilities to monitor physical or environmental conditions. There are number of challenges in WSNs because of limitation of battery power, communications, computation and storage space. In the recent years, computational intelligence approaches such as evolutionar...
متن کاملNearly Optimal Exploration-Exploitation Decision Thresholds
While in general trading off exploration and exploitation in reinforcement learning is hard, under some formulations relatively simple solutions exist. Optimal decision thresholds for the multi-armed bandit problem, one for the infinite horizon discounted reward case and one for the finite horizon undiscounted reward case are derived, which make the link between the reward horizon, uncertainty ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998